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About Me 


• Sr. Software Engineer, Streaming Team (g) Uber 

o Streaming team supports platform for real time data 
analytics: Kafka, Samza, Flink, Pinot.. and plenty more 
o Focused on scaling Kafka at Uber's pace 

• Staff software Engineer (g) Ebay 

o Build & scale Ebay's cloud using openstack 

• Apache Kylin: Committer, Emeritus PMC 



Agenda 


• Real time Use Cases 

• Kafka Infrastructure Deep Dive 

• Our own Development: 
o Rest Proxy & Clients 
o Local Agent 

o uReplicator (Mirrormaker) 
o Chaperone (Auditing) 

• Operations/Tooling 



Important Use Cases 











Real-time Machine Learning - UberEats ETD 
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UBER 








Fraud detection 
Share my ETA 


And many more... 
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Apache Kafka is Uber's Lifeline 



Kafka ecosystem @ Uber 
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Kafka cluster stats 


100s of billion 

Messages/day 

100s TB 

bytes/day 

Multiple data centers 



Kafka Infrastructure Deep Dive 



Requirements 


• Scale to 100s Billions/day —> 1 Trillion/day 

• High Throughput ( Scale: 100s TB —> PB) 

• Low Latency for most use cases(<5ms ) 

• Reliability - 99.99% ( #Msgs Available /#Msgs Produced) 

• Multi-Language Support 

• Tens of thousands of simultaneous clients. 

• Reliable data replication across DC 




Kafka Pipeline 
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Kafka Pipeline: Data Flow 
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Kafka Clusters 
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Kafka Clusters 


• Use case based clusters 
o Data (async, reliable) 

o Logging (High throughput) 
o Time Sensitive (Low Latency e.g. Surge, Push 
notifications) 

o High Value Data (At-least once. Sync e.g. Payments) 

• Secondary cluster as fallback 

• Aggregate clusters for all data topics. 



Kafka Clusters 


• Scale to 100s Billions/day —> 1 Trillion/day 

• High Throughput ( Scale: 100s TB —> PB) 

• Low Latency for most use cases(<5ms ) 

• Reliability - 99.99% ( #Msgs Available /#Msgs Produced) 

• Multi-Language Support 

• Tens of thousands of simultaneous clients. 

• Reliable data replication across DC 




Kafka Rest Proxy 












Why Kafka Rest Proxy ? 


• Simplified Client API 

• Multi-lang support (Java, NodeJs, Python, Golang) 

• Decouple client from Kafka broker 
o Thin clients = operational ease 

o Less connections to Kafka brokers 
o Future kafka upgrade 

• Enhanced Reliability 

o Primary Si Secondary Kafka Clusters 



Kafka Rest Proxy: Internals 






















Kafka Rest Proxy: InternalE 



Async Mode 
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Kafka Rest Proxy: Internals 


• Based on Confluent's open sourced Rest Proxy 

• Performance enhancements 

o Simple http servlets on jetty instead of Jersey 
o Optimized for binary payloads, 
o Performance increase from 7K* to 45-50K QPS/box 

• Caching of topic metadata. 

• Reliability improvements* 

o Support for Fallback cluster 

o Support for multiple Producers (SLA based segregation) 

• Plan to contribute back to community 


*Based on benchmarking & analysis done in Jun '2015 



Rest Proxy: performance (1 box) 



Message rate (K/second) at single node 



















Kafka Clusters + Rest Proxy 


• Scale to 100s Billions/day —> 1 Trillion/day 

• High Throughput ( Scale: 100s TB —> PB) 

• Low Latency for most use cases(<5ms ) 

• Reliability - 99.99% ( #Msgs Available /#Msgs Produced) 

• Multi-Language Support 

• Tens of thousands of simultaneous clients. 

• Reliable data replication across DC 




Kafka Clients 


















Client Libraries 


• Support for multiple clusters. 

• High Throughput 

o Non-blocking, async, batching 

o <1ms produce latency for clients 

o Handles Throttling/BackOff signals from Rest Proxy 

• Topic Discovery 

o Discovers the kafka cluster a topic belongs 
o Able to multiplex to different kafka clusters 

• Integration with Local Agent for critical data 



Client Libraries 
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Client Libraries 
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Kafka Clusters + Rest Proxy + Clients 


• Scale to 100s Billions/day —> 1 Trillion/day 

• High Throughput ( Scale: 100s TB —► PB) 

• Low Latency for most use cases(<5ms ) 

• Reliability - 99.99% ( #Msgs Available /#Msgs Produced) 

• Multi-Language Support 

• Tens of thousands of simultaneous clients. 

• Reliable data replication across DC 




Local Agent 
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Local Agent 


• Local spooling in case of downstream outage/backpressure 

• Backfills at the controlled rate to avoid hammering 
infrastructure recovering from outage 

• Implementation: 

o Reuses code from rest-proxy and kafka's log module, 
o Appends all topics to same file for high throughput. 



Local Agent Architecture 



































Local Agent in Action 








Kafka Clusters + Rest Proxy + Clients + Local Agent 


• Scale to 100s Billions/day —> 1 Trillion/day 

• High Throughput ( Scale: 100s TB —> PB) 

• Low Latency for most use cases(<5ms ) 

• Reliability - 99.99% ( #Msgs Available /#Msgs Produced) 

• Multi-Language Support 

• Tens of thousands of simultaneous clients. 

• Reliable data replication across DC 




uReplicator 
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Multi-DC data flow 



Traffic from DC1 


Traffic from DC2 
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Mirrormaker: existing 
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□ Replicator: In-house solution 
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Kafka Clusters + Rest Proxy + Clients + Local Agent 


• Scale to 100s Billions/day —> 1 Trillion/day 

• High Throughput ( Scale: 100s TB —> PB) 

• Low Latency for most use cases(<5ms ) 

• Reliability - 99.99% ( #Msgs Available /#Msgs Produced) 

• Multi-Language Support 

• Tens of thousands of simultaneous clients. 

• Reliable data replication across DC 




uReplicator 


• Running in production for 1+year 

• Open sourced: https://github.com/uber/uReplicator 

• Blog: https://eng.uber.com/ureplicator/ 



Chaperone - E2E Auditing 



Chaperone Architecture 
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Chaperone: Track counts 


COMPLETENESS 


Count per 10min by Tier 
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Count per 10min by Tier 
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Chaperone 


Track Latency 


MSG RATE AND LATENCY 


topicOI msg rate 
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topicOI p99 latency by tier 
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Chaperone 


• Running in production for 1 +year 

• Planning to open source in ~2 Weeks 



At-least Once Kafka 



Why do we need it? 
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• Most of infrastructure tuned for high throughput 
o Batching at each stage 

o Ack before produce (ack'ed != committed) 

• Single node failure in any stage leads to data loss 

• Need a reliable pipeline for High Value Data e.g. Payments 

















How did we achieve it? 


• Brokers: 

o min.insync.replicas=2, can only torrent one node failure 
o unclean.leader.election= false, need to wait until the old 
leader comes back 

• Rest Proxy: 

o Partition Failover 

• Improved Operations: 

o Replication throttling, to reduce impact of node bootstrap 
o Prevent catching up nodes to become I5R 



Operations/Tooling 



Partition Rebalancing 









































Partition Rebalancing 

• Calculates partition 
imbalance and inter-broker 
dependency. 

• Generates & Executes 
Rebalance Plan. 

• Rebalance plans are 
incremental, can be stopped 
and resumed. 

• Currently on-demand. 
Automated in the future. 












XF5vs EXT4 







Summary: Scale 

• Kafka Brokers: 

o Multiple Clusters per DC 
o Use case based tuning 

• Rest Proxy to reduce connections and better batching 

• Rest Proxy & Clients 

o Batch everywhere, Async produce 
o Replace Jersey with Jetty 

• XF5 



Summary: Reliability 


• Local Agent 

• Secondary Clusters 

• Multi Producer support in Rest Proxy 

• uReplicator 

• Auditing via Chaperone 



Future Work 


• Open source contribution 
o Chaperone 

o Toolkit 

• Data Lineage 

• Active Active Kafka 

• Chargeback 

• Exactly once mirroring via uReplicator 



Questions ? 


ankur@uber.com 


Extra Slides 
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Kafka Durability (acks=1 
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Kafka Durability (acks=1 
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kafka 


Distributed Messaging system 

• High throughput 

• Low latency 

• Scalable 

• Centralized 

• Real-time 



* Supported in Kafka 0.8+ 





























What is Kafka? 


• Partitioned 

• Replicated 

• Commit Log 
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What is Kafka? 


• Distributed 
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• Commit Log 







What is Kafka? 


• Distributed 

• Partitioned 

• Commit Log 









What is Kafka? 


• Distributed 

• Partitioned 

• Replicated 









Kafka Concepts 



active replica (id y) of partition x 
for topic "zerg.hydra" 


active replica (id y) of partition x, 
this broker is leader for that partition 
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